Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees
نویسندگان
چکیده
Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temporal features. To fill this gap, a relatively recent data mining approach called gradient boosting decision trees (GBDT) is applied to short-term subway ridership prediction and used to capture the associations with the independent variables. Taking three subway stations in Beijing as the cases, the short-term subway ridership and alighting passengers from its adjacent bus stops are obtained based on transit smart card data. To optimize the model performance with different combinations of regularization parameters, a series of GBDT models are built with various learning rates and tree complexities by fitting a maximum of trees. The optimal model performance confirms that the gradient boosting approach can incorporate different types of predictors, fit complex nonlinear relationships, and automatically handle the multicollinearity effect with high accuracy. In contrast to other machine learning methods—or “black-box” procedures—the GBDT model can identify and rank the relative influences of bus transfer activities and temporal features on short-term subway ridership. These findings suggest that the GBDT model has considerable advantages in improving short-term subway ridership prediction in a multimodal public transportation system.
منابع مشابه
Predicting The Type of Malaria Using Classification and Regression Decision Trees
Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...
متن کاملAn evaluation of randomized machine learning methods for redundant data: Predicting short and medium-term suicide risk from administrative records and risk assessments
Accurate prediction of suicide risk in mental health patients remains an open problem. Existing methods including clinician judgments have acceptable sensitivity, but yield many false positives. Exploiting administrative data has a great potential, but the data has high dimensionality and redundancies in the recording processes. We investigate the efficacy of three most effective randomized mac...
متن کاملPredicting the Internet’s Evolution with Decision Trees and Lasso Logistic Regression Models
The Internet self-evolves rapidly and its dynamic structure poses many interesting questions for researchers in network analysis. In this paper I show how we can simplify the entire Internet as a mathematical graph and then extract its structural characteristics; these characteristics in turn help us build statistical models that can predict how the Internet will evolve. The data describing the...
متن کاملResearch Paper Business Analytics Predicting participant behavior and dropout in a physical activity experiment using machine learning
Physical inactivity is identified by the World Health Organization as the fourth leading risk factor for global mortality as it increases the risk of various adverse health conditions. App-based health interventions promise to help increase physical activity levels by enhancing the motivation of the users to exercise on a regular basis. One such app-based intervention which sends tailored coach...
متن کاملFinding Influential Training Samples for Gradient Boosted Decision Trees
We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric model...
متن کامل